Extracting clinical care pathways correlated with outcomes

ABSTRACT

Systems and methods for data analysis include constructing patient traces as a set of medical events for each patient of a patient population, the patient population being segmented based on patient outcomes. Medical events in one or more of the patient traces are reduced to provide processed patient traces. The processed patient traces are clustered to identify a cluster of patient traces. A process model is mined, using a processor, representing an aggregation of treatment pathways in the patient traces from the cluster. Patterns from patient traces are identified that are discriminative of patient outcomes. At least one of the patterns is represented with respect to the process model to identify treatment pathways correlated with the patient outcomes.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to commonly assigned U.S. application Ser.No. 13/851,675, entitled “EXTRACTING KEY ACTION PATTERNS FROM PATIENTEVENT DATA,” filed Mar. 27, 2013, and commonly assigned U.S. applicationSer. No. 13/851,618, entitled “CLUSTERING BASED PROCESS DEVIATIONDETECTION,” filed Mar. 27, 2013, both of which are incorporated hereinby reference in their entirety.

This application is a Continuation application of co-pending U.S. patentapplication Ser. No. 13/851,755 filed on Mar. 27, 2013, incorporatedherein by reference in its entirety.

BACKGROUND

1. Technical Field

The present invention relates to analysis of patient data, and moreparticularly to extracting clinical care pathways correlated withoutcomes.

2. Description of the Related Art

Identifying care pathways correlated with patient outcomes from patientevent data is important for gaining insight into which care pathwayswill lead to positive or negative outcomes. Once identified, such carepathways could be used to refine care plan descriptions for treatingparticular diseases, such as congestive heart failure. However, realworld raw patient event data suffers from a number of potentialproblems. For example, it is common for multiple events to occurconcurrently, causing pattern explosion. Another problem is that thediversity of events could be explosive. These problems may cause loopsand spaghetti-like patterns in the patient event data when a processmodel is mined. Existing process mining approaches do not correlateclinical pathways with patient outcomes. In addition, there is noexisting research that provides for the overlay of clinical pathwayscorrelated with patient outcomes on a mined model of patient eventtraces.

SUMMARY

A method for data analysis includes constructing patient traces as a setof medical events for each patient of a patient population, the patientpopulation being segmented based on patient outcomes. Medical events inone or more of the patient traces are reduced to provide processedpatient traces. The processed patient traces are clustered to identify acluster of patient traces. A process model is mined, using a processor,representing an aggregation of treatment pathways in the patient tracesfrom the cluster. Patterns from patient traces are identified that arediscriminative of patient outcomes. At least one of the patterns isrepresented with respect to the process model to identify treatmentpathways correlated with the patient outcomes.

A system for data analysis includes a medical records databaseconfigured to construct patient traces stored on a computer readablestorage medium as a set of medical events for each patient of a patientpopulation, the patient population being segmented based on patientoutcomes. A trace preprocess module is configured to reduce medicalevents in one or more of the patient traces to provide processed patienttraces. A cluster module is configured to cluster the processed patienttraces to identify a cluster of patient traces. A pathway extractionmodule is configured to mine a process model representing an aggregationof treatment pathways in the patient traces from the cluster. A patternextraction module is configured to identify patterns from patient tracesthat are discriminative of patient outcomes. A visual interface isconfigured to represent at least one of the patterns with respect to theprocess model to identify treatment pathways correlated with the patientoutcomes.

These and other features and advantages will become apparent from thefollowing detailed description of illustrative embodiments thereof,which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

The disclosure will provide details in the following description ofpreferred embodiments with reference to the following figures wherein:

FIG. 1 is a block/flow diagram showing a system/method for extractingclinical care pathways correlated with patient outcomes, in accordancewith one illustrative embodiment;

FIG. 2 shows an exemplary process model, in accordance with oneillustrative embodiment;

FIG. 3 shows an exemplary process model with a discriminative patternoverlaid, in accordance with one illustrative embodiment; and

FIG. 4 is a block/flow diagram showing a system/method for extractingclinical care pathways correlated with patient outcomes, in accordancewith one illustrative embodiment.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

In accordance with the present principles, systems and methods forextracting clinical care pathways correlated with outcomes are provided.Patient traces are constructed as sets of medical events for eachpatient. The patient traces are preprocessed to reduce events in thepatient traces and thereby reduce complexity. Preprocessed patienttraces are then clustered and a cluster is identified, such as, e.g.,the largest cluster, to remove patient outliers. Process mining isperformed to mine a process model representing aggregated clinicaltreatment pathways from the patient traces of the cluster.Discriminative patterns are mined, e.g., from the preprocessed patienttraces to identify patterns that are discriminative of patient outcomes.The discriminative patterns are overlaid on the process model toidentify clinical pathways that are correlated with a particular patientoutcome.

The present principles provide a visual overlay of discriminativepatterns with respect to the process model to enable a user to identifyone or more discriminative patterns in the context of the end-to-endclinical care pathways. One advantage of the present principles is thata user can identify the key clinical practice pathways that arecorrelated with positive or negative outcomes on the mined model.Insight can be obtained by comparing and contrasting separate overlaysof clinical practice pathways correlated to positive and negativepatient outcome.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing. Computer program code for carrying out operations foraspects of the present invention may be written in any combination ofone or more programming languages, including an object orientedprogramming language such as Java, Smalltalk, C++ or the like andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The program codemay execute entirely on the user's computer, partly on the user'scomputer, as a stand-alone software package, partly on the user'scomputer and partly on a remote computer or entirely on the remotecomputer or server. In the latter scenario, the remote computer may beconnected to the user's computer through any type of network, includinga local area network (LAN) or a wide area network (WAN), or theconnection may be made to an external computer (for example, through theInternet using an Internet Service Provider).

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks. The computer program instructions may also beloaded onto a computer, other programmable data processing apparatus, orother devices to cause a series of operational steps to be performed onthe computer, other programmable apparatus or other devices to produce acomputer implemented process such that the instructions which execute onthe computer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblocks may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

Referring now to the drawings in which like numerals represent the sameor similar elements and initially to FIG. 1, a block/flow diagramshowing a system for extracting clinical care pathways correlated withpatient outcomes 100 is illustratively depicted in accordance with oneembodiment. The system 100 may analyze data (e.g., patient data) toidentify care pathways correlated with patient outcomes.

While the present principles are described in terms of healthcare, itshould be understood that the present principles are not so limited.Rather, other applications are also contemplated within the scope of thepresent principles, such as, e.g., insurance.

The system 100 may include a system or workstation 102. The system 102preferably includes one or more processors 110 and memory 112 forstoring patient data, applications, modules and other data. The system102 may also include a visual interface 104, which may include one ormore displays 106 for viewing. The displays 106 may permit a user tointeract with the system 102 and its components and functions. This maybe further facilitated by a user interface 108, which may include amouse, joystick, or any other peripheral or control to permit userinteraction with the system 102 and/or its devices. It should beunderstood that the components and functions of the system 102 may beintegrated into one or more systems or workstations.

The system 102 may receive input 114, which may include, e.g., healthcare event data for a cohort of patients stored in a medical recordsdatabase, such as, e.g., electronic medical records (EMR) 118. Thepatient cohort may be defined by a user (e.g., physician). For example,the patient cohort may include outputs of risk stratificationprocedures. Health care event data may include patient demographics,physician notes, immunizations, radiology reports, etc. EMR 118hierarchically stores health care event data as medical events, such as,e.g., medications, labs, diagnoses, vital signs, etc., as well aspatient outcomes. The patient cohort may be segmented by outcomeaccording to criterion, e.g., into positive and negative outcomes. Forexample, patients not hospitalized for congestive heart failure one yearafter diagnosis may be a positive outcome, while patients hospitalizedfor congestive heart failure within one year after diagnosis may be anegative outcome. Other types of segmentation may also be employed. Apatient trace is constructed for each patient as a set of ordered events(e.g., chronologically) leading to a patient outcome. Each patient tracemay include attributes for each event, such as, e.g., event names, eventtimestamps, etc.

Trace preprocess module 120 is configured to preprocess the patienttraces due to the big data nature of medical records. Trace preprocessmodule 120 reduces the number of events in a patient trace byperforming, e.g., filtering, aggregating events of a concurrent eventand consolidating consecutive identical events. Other forms ofpreprocessing are also contemplated.

Trace preprocess module 120 may be configured to filter the patienttraces. Patient data are hierarchically stored in EMR 118 in terms ofmedical events. For example, diagnosis events may be stored as adiagnosis hierarchy involving the levels, from highest to lowest:hierarchy name, Hierarchical Condition Categories (HCC) code, Diagnosis(DX) group names and the International Classification of Diagnosis 9thEdition (ICD9). In another example, medication events may be stored in amedication hierarchy, involving the levels, from highest to lowest:pharmacy class, pharmacy subclass and ingredient. Other hierarchicalarrangements are also applicable. Trace preprocess module 120 filterspatient traces by replacing event names with its hierarchicalcategorical name to reduce the diversity of events. Hierarchicalcategorical names may be obtained from the EMR 118, ontologies, etc.Trace preprocess module 120 may also filtering out events by type orattribute, etc. For example, all medication names may be replaced by itsPharmacy Subclass name, all diagnoses names may be replaced by DX groupNames, lab events may be filtered to include labs for congestive heartfailure, etc. Other types of filtering may also be implemented.

Trace preprocess module 120 may also be configured to aggregate eventsof a concurrent event. Due to resolution limits of temporal data (e.g.,one day) in EMR 118, patient traces often involve complex events. Othertime periods are also contemplated. For example, during a day, a patientmay encounter multiple medical events. These medical events occurringwithin a same day are treated as same day concurrent events (SDCEs) dueto the resolution of temporal data. However, increasing numbers ofevents within an SDCE may lead to a dramatic increase in patient traces,as all combinations of events must be accounted for. To address thisissue of pattern explosion, trace preprocessing module 120 aggregatesthe events of the SDCEs into super events. In this way, the number ofevents in an SDCE is reduced.

First, clinical event packages are identified from each SDCE (e.g.,using frequent itemset mining). Clinical event packages are sets ofevents that have a certain frequency of occurrence among all SDCEs. Atwo-way sorting approach is then applied to aggregate events withinSDCEs as super events based on the identified clinical event packages.Clinical event packages identified from an SDCE are first sortedaccording to cardinality. Clinical event packages with a samecardinality are then sorted by appearance frequency. The clinical eventpackage having a longest cardinality is selected as a super event. Wheremultiple clinical event packages have the same longest cardinality, theclinical event package with the same longest cardinality that has ahighest appearance frequency is selected as the super event. Thisprocess is repeated for the remaining events in the SDCE. By groupingevents within an SDCE as super events, the number of events in an SDCEis thereby reduced.

Trace preprocess module 120 may further be configured to remove orconsolidate consecutive identical events. Consecutive identical eventsmay suggest some routine check or periodical treatment and thereforethese events may be treated similarly. However, the temporal eventpatterns of repeating events are not as informative. Consecutiveidentical events can be removed or consolidated to eliminate eventself-loops in detected patterns. Consolidated consecutive identicalevents may be distinguished by, e.g., adding the prefix “Rep,” addingthe suffix “-Repeat,” etc. For example, vital events occur for a samepatient consecutively can be consolidated as Vital-Repeat in thepatient's trace.

Cluster module 122 is configured to cluster the preprocessed patienttraces to identify a dominant set of similar patient traces to therebyremove patient outliers. In one embodiment, for example, where there isa small amount of data, the cluster module 122 may be skipped and thepreprocessed patient traces may proceed to pathway extraction module124. Cluster module 122 clusters patient traces based on their executionsimilarity. The execution of a patient trace refers to the set and orderof medical events, such as, e.g., medications, labs, vitals, anddiagnoses. Patient traces are transformed into string-basedrepresentations and a density-based clustering is applied using a stringedit-distance metric.

In one embodiment, patient traces are transformed into strings bymapping all known event types to, e.g., Unicode characters. Table 1shows an exemplary mapping of event types to characters.

TABLE 1 exemplary mapping of event types to Unicode characters. EventType Mapped Character OrderReceived A ShipmentCreated B TransportStartedC TransportEnded D InvoiceIssued E

Each event in each patient trace (T1, T2, . . . , Tn) is then replacedwith the corresponding mapped character. The ordering of the resultingstring representation of each patient trace corresponds to the orderingof events for that patient trace (e.g., by time). Table 2 shows anexemplary string representation of patient traces according to themapping of Table 1. Each patient trace in its string representation isnow considered to be a point and handled to compute patient traceclusters.

TABLE 2 exemplary string representation of patient traces. String TraceRepresentation OrderReceived → ShipmentCreated → TransportStarted ABCDE→ TransportEnded → InvoiceIssued OrderReceived → ShipmentCreated →TransportStarted ABCBC → ShipmentCreated → TransportStarted → . . .

Clustering may include performing, e.g., DBSCAN (density-based spatialclustering of applications with noise), k-nearest neighbor clustering(k-NN), etc. Other clustering approaches may also be employed. Theresults are one or more clusters of patient traces that share a similarbehavior. Clustering receives one or more parameters (e.g., epsilon) asan input indicating the maximum distance between points allowable in acluster.

Pathway extraction module 124 is configured to mine a process model fromthe patient traces of a cluster. Preferably, the process model is minedfrom the largest cluster. However, the process model may be mined fromother clusters, such as, e.g., clusters that include a number of patienttraces that meet or exceed a threshold amount, etc. Based on patienttraces of medical events, models can be extracted that describeunderlying processes. A business process model shows a specific orderingof work activities with a beginning, an end, and clearly indicatedinputs and outputs. In one embodiment, a process model can berepresented in terms of a Petri net, which is a formal, graphical,executable technique for the specification and analysis of concurrent,discrete-event dynamic systems. In another embodiment, a process modelcan be represented as a Business Process Modeling Notation (BPMN). Otherrepresentations may also be employed.

The process model in accordance with the present principles is anaggregation of patient traces to form a model of aggregated clinicalpatient treatment pathways from all relevant patient traces. Anartificial start event and artificial end event are injected in eachpatient trace. This allows the resulting mined process model to includestart and end nodes. The start event may be given a timestamp thatoccurs before the earliest event in that patient trace. The end eventmay be given a timestamp that occurs after the last event in the patienttrace. Pathway extraction module 124 mines process models by applying,e.g., the HeuristicMiner technique. The HeuristicsMiner techniqueaddresses mining of traces that could be incomplete and may containnoise. HeuristicMiner computes an edge frequency (a number between 0and 1) to indicate the confidence in an edge. HeuristicMiner provides anumber of heuristic rules that rely on the frequency of edges to inferordering relations that determine the semantics of the underlyingprocess model captured by the traces. Other process mining techniquesare also contemplated.

Referring for a moment to FIG. 2, an exemplary process model is shown inaccordance with one illustrative embodiment. Events include, e.g., labpanels (LabPanelA, LabPanelB, etc.), medications (AntianginalAgents4,BetaBlockers2, Biuretics3, etc.) and diagnoses (heartfailure). Eventsare represented as nodes. Dependencies between nodes are represented asedges. The process model has a start and end node to indicate the startand end of aggregation of treatment pathways.

Referring back to FIG. 1, pathway extraction module 124 may also refinethe process model to alter the complexity of the process model (e.g.,according to a preferred graph density or scarcity). Process modelrefinement may include varying the dependency measure or the minimumnumber of observations. Other process model refinements may also beemployed.

Process model refinement may include varying the dependency measure. Thedependency edge between repeating event node pairs is defined based onthe dependency measure (e.g., threshold). Repeating event node pairsrefers to two event nodes connected by an edge, occurring multiple timesregardless of the direction of dependency of the edge. For repeatingevent node pairs, the dependency edge between the two event nodes isdefined by comparing the frequency of occurrence of directions ofdependencies for all repeating event node pairs to a threshold. Forexample, given repeating event node pairs A and B, the frequency ofoccurrence of (A→B) is compared with the frequency of occurrence of(B→A). If the frequency of occurrence of (A→B) exceeds (B→A) by apredefined threshold, the dependency edge between event nodes A and B isrepresented as (A→B). Similarly, if the frequency of occurrence of (B→A)exceeds (A→B) by a predefined threshold, the dependency edge betweenevent nodes A and B is represented as (B→A).

Process model refinement may include varying the threshold on theminimum number of observations identified in process mining to display anode. For example if the threshold on the minimum number of observationsis specified as 10, then at least 10 patient traces must contain a node,for that node to be shown in the process model mined by the miningalgorithm. Every time the threshold on the minimum number ofobservations is changes, the process model is mined again from thetraces.

Pattern extraction module 126 is configured to extract patterns from thepreprocessed patient traces (from trace preprocess module 120) andidentify the patterns that are discriminative for a patient outcome.Pattern extraction 126 may be performed separately (e.g., in parallel,successively, etc.) from pathway extraction 124. One goal of extractingdiscriminative patterns is to detect frequent patterns from the patienttraces such that the patterns are frequent and discriminative. Patternsshould be frequent in that they should appear in a certain portion ofthe patient population. Patterns should also be discriminative in thatthey should be correlated for different outcomes.

Pattern extraction module 126 receives preprocessed patient traces fromtrace preprocess module 120 and outputs a set of identified patterns.Pattern extraction module 126 may apply any subsequence miningtechnique, such as, e.g., prefixScan or SPAM (sequential patternmining). Pattern mining may be based on an inputted support value tospecify how frequent the final detected patterns are to be. Patternextraction module 126 mines patterns from patient traces with differentoutcomes and identifies patterns that are frequent with one type ofoutcome but scarce with the other.

Patterns may be represented as bag-of-pattern vectors for each patienttrace. First, patterns are organized into a pattern dictionary of a sizem, where m is the number of different event patterns in the patienttrace. The bag-of-pattern vector for each patient trace is anm-dimensional vector, where the value of the i-th dimension representsthe frequency of occurrence of the i-th event corresponding to a patienttrace.

Often times, pattern mining results in a large number of patterns. Assuch, the constructed bag-of-pattern representations may be very sparse,since most patterns do not occur most of the time. However, if vectorsare too sparse, computational models may not be meaningful. Patternextraction module 126 may apply hierarchical pattern summarization tocompress the pattern set.

Hierarchical pattern summarization merges detected pattern pairs in ahierarchical (or recursive) way. A pattern pair may be merged as asingle pattern, and the dependency between events may be ignored. Apattern pair refers to a pair of patterns having the same events, butdifferent (e.g., opposite) dependency edge directions. For example, if(A→B) is a detected pattern, and (B→A) is also a detected pattern, thenthe patterns (A→B) and (B→A) can be merged as a single pattern (A;B) andthe order between them can be ignored. The bag-of-pattern vectorrepresentation of the resultant pattern after merging has frequenciesthat are equal to the sum of the individual patterns it merged from.Hierarchical pattern summarization can be repeated for all patternpairs.

Pattern extraction module 126 identifies discriminative patterns for apatient outcome by outcome analysis, which may include, e.g., sparselogistic regression, etc. Patterns may be extracted from the set ofpreprocessed patient traces (i.e., from trace preprocess module 120),from a cluster of patient traces (i.e., from cluster module 122), etc.The extracted patterns are preferably preprocessed to filter thepatterns before outcome analysis is performed. Filtering may be basedon, e.g., odds ratio, information gain, etc.

Visual interface 104 is configured to visually represent the processmodel (from pathway extraction 124) and discriminative patterns (frompattern extraction 126) as an output 116. The visual interface 104 mayinvolve one or more displays 106 and/or user interfaces 108. Preferably,the top k discriminative patterns are overlaid over the mined model,where k is any positive integer specified by a user. Discriminativepatterns are distinguished from the process model, e.g., by color.Discriminative patterns may also be distinguished to indicate patientoutcome (e.g., green for positive outcome, red for negative outcome,etc.). Other representations of the discriminative patterns are alsocontemplated, such as, e.g., line thickness, highlights, box, dimmingareas outside the discriminative patterns, etc.

Referring for a moment to FIG. 3, a discriminative pattern is laid overa process model, in accordance with one illustrative embodiment. Thediscriminative pattern is identified in box 302, as the pattern fromvital to BilirubinDirect. The discriminative pattern may be correlatedwith a positive patient outcome. For example, in one embodiment, thenodes of the discriminative pattern and edges connecting the nodes maybe colored green to indicate the positive patient outcome.

The present principles provide a visual overlay of the discriminativepatterns on a process model to enable a user to identify one or morediscriminative patterns correlated with outcomes (e.g., positive ornegative) in the context of the end-to-end clinical care pathways.Additionally, users can see the key clinical practice pathwayscorrelated with outcomes on the mined model that represents anaggregation of all the clinical practice pathways. Insight can beobtained by comparing and contrasting separate overlays of clinical carepathways correlated to positive patient outcomes and negative patientoutcomes.

Referring now to FIG. 4, a block/flow diagram showing a method forextracting clinical care pathways correlated with outcomes 400 isdepicted in accordance with one illustrative embodiment. In block 402,patient traces are constructed as a set of medical events for eachpatient. Patient medical information may be hierarchically stored asmedical events, which may include, e.g., medications, labs, diagnoses,vital signs, etc. Patient traces may correspond to patient outcomes,which may be segmented, e.g., into positive and negative outcomes.

In block 404, the patient traces are processed to reduce a number ofevents in a patient trace. In block 406, events of the patient tracesare filtered. Filtering may include, e.g., replacing event names with ahierarchical categorical name, filtering events of a patient trace bytype or attribute, etc.

In block 408, events of a concurrent event may be aggregated to reducethe number of events in the concurrent event. Events occurring within apredefined time period (e.g., one day) may be represented as concurrentevents. The number of events in a concurrent event may be reduced byfirst identifying event packages from the events in the concurrent event(e.g., by frequent itemset mining). A two-way sorting approach may beapplied by first sorting event packages according to cardinality, andthen sorting event packages with the same cardinality by appearancefrequency. The event package with the longest cardinality is selected asa super event. If multiple event packages have the same longestcardinality, the event package with the longest cardinality that has thehighest appearance frequency is selected as the super event. The processis repeated for the remaining events of the concurrent event.

In block 410, consecutive events of a same type are consolidated.Consecutive events of a same type preferably include consecutiveidentical events. The consolidated event may be distinguished by, e.g.,adding the prefix “Rep,” adding the suffice “- Repeat,” etc.

In block 412, the patient traces are clustered. Clustering may includetransforming patient traces into string-based representations andapplying a string edit-distance metric. Clustering may include, e.g.,DBSCAN, k-NN, etc. In block 412, a process model is constructed from acluster of patient traces. The cluster may be the largest cluster, anycluster that includes a number of patient traces meeting or exceeding athreshold, etc. Constructing the process model may include adding astart event and end event in each patient trace. The start event has atimestamp that occurs before the earliest event in the patient trace.The end event has a timestamp that occurs after the latest event in thepatient trace. The process model may be extracted by applying, e.g.,HeuristicMiner.

In block 416, the process model is refined. In one embodiment, refiningincludes defining a dependence edge between two repeating event nodesaccording to the frequencies of the directions of dependencies for allof the repeating event nodes. The frequencies of the directions ofdependencies may be compared to a threshold to define the direction ofthe dependency edge. In another embodiment, refining includes employinga minimum number of observations for showing a patient trace on theprocess model. Other methods of refining are also contemplated.

In block 418, patterns are extracted from patient traces. Preferably,patterns are extracted from the processed patient traces (in block 404).In other embodiment, patterns are extracted from a cluster of patienttraces (in block 412). Pattern extraction may include any subsequencemining method, such as, e.g., prefix Scan, SPAM, etc. In block 420,patterns are represented as bag-of-pattern vectors for each patienttrace. Events for a patient trace are collected into a patterndictionary, including event and frequency of event occurrence. Theentries of the bag-of-pattern vector indicate the frequency ofappearance for the corresponding event of a patient trace. In block 422,pattern pairs having same events are merged and the dependency betweenthe events is ignored. The frequency indicated in the bag-of-patternvector for the resulting merged event is equal to the sum of thefrequencies of each individual pattern it merged from.

In block 424, discriminative patterns are determined. Determiningdiscriminative patterns may first include a preprocessing step to reducethe number of patterns based on, e.g., odds ratio, information gain,etc. Outcome analysis is performed on the remaining patterns by, e.g.,sparse logistic regression to identify patterns most discriminative of aparticular patient outcome.

In block 426, the discriminative patters are overlaid on the processmodel. Discriminative patterns may be represented to indicate patientoutcome by, e.g., color.

Having described preferred embodiments of a system and method forextracting clinical care pathways correlated with outcomes (which areintended to be illustrative and not limiting), it is noted thatmodifications and variations can be made by persons skilled in the artin light of the above teachings. It is therefore to be understood thatchanges may be made in the particular embodiments disclosed which arewithin the scope of the invention as outlined by the appended claims.Having thus described aspects of the invention, with the details andparticularity required by the patent laws, what is claimed and desiredprotected by Letters Patent is set forth in the appended claims.

What is claimed is:
 1. A computer readable storage medium comprising acomputer readable program for data analysis, wherein the computerreadable program when executed on a computer causes the computer toperform the steps of: constructing patient traces as a set of medicalevents for each patient of a patient population, the patient populationbeing segmented based on patient outcomes; reducing medical events inone or more of the patient traces to provide processed patient traces;clustering the processed patient traces to identify a cluster of patienttraces; mining a process model, using a processor, representing anaggregation of treatment pathways in the patient traces from thecluster; identifying patterns from patient traces that arediscriminative of patient outcomes; and representing at least one of thepatterns with respect to the process model to identify treatmentpathways correlated with the patient outcomes.
 2. A system for dataanalysis, comprising: a medical records database configured to constructpatient traces stored on a computer readable storage medium as a set ofmedical events for each patient of a patient population, the patientpopulation being segmented based on patient outcomes; a trace preprocessmodule configured to reduce medical events in one or more of the patienttraces to provide processed patient traces; a cluster module configuredto cluster the processed patient traces to identify a cluster of patienttraces; a pathway extraction module configured to mine a process modelrepresenting an aggregation of treatment pathways in the patient tracesfrom the cluster; a pattern extraction module configured to identifypatterns from patient traces that are discriminative of patientoutcomes; and a visual interface configured to represent at least one ofthe patterns with respect to the process model to identify treatmentpathways correlated with the patient outcomes.
 3. The system as recitedin claim 2, wherein the visual interface is further configured todisplay the at least one of the patterns overlaid on the process model.4. The system as recited in claim 2, wherein the visual interface isfurther configured to represent the at least one of the patterns withthe process model based on the patient outcomes.
 5. The system asrecited in claim 2, wherein the visual interface is further configuredto highlight nodes of the at least one of the patterns and edges betweenthe nodes.
 6. The system as recited in claim 2, wherein the cluster ofpatient traces includes at least one of a largest cluster of patienttraces and a cluster having a number of patient traces meeting orexceeding a threshold number of patient traces.
 7. The system as recitedin claim 2, wherein the cluster module is further configured torepresent each patient trace of the processed patient traces as a stringand compute a string edit distance between two patient traces of theprocessed patient traces to determine similarity between the two patienttraces.
 8. The system as recited in claim 2, wherein the mining moduleis further configured to add a start event and an end event to each ofthe patient traces, wherein the start event has a timestamp earlier thanall other events in its patient trace and the end event has a timestamplater than all other events in its patient trace.
 9. The system asrecited in claim 2, wherein the mining module is further configured todefine a dependency between repeating event node pairs according to afrequency of each direction of the dependencies from each of therepeating event node pairs.
 10. The system as recited in claim 2,wherein the mining module is further configured to represent medicalevents in the process model according to a frequency of appearance ofmedical events in the cluster of patient traces compared to a threshold.